1 Dataset Description

A total of 7880 individuals from 2611 families were genotyped on Illumina Human 1Mv1.

  • 4901 males, 2979 females.
  • 2571 trios, 36 quads, 1 pentas, 3 hexs.
  • 947,233 SNPs were genotyped.
  • Coordinates were based on Build36.


2 Raw Genotype QC

2.1 Sex Check

  • 141 PRROBLEM
    • 115 with complete missing chrX genotypes.
    • 26 with chrX-F ranging from 0.20 to 0.62

2.1.1 Mismatch summary



2.1.2 ChrX F distributions



2.2 Pariwise IBD estimation

  • Relationships (RT): OT (Others), FS (Full Siblings), PO (Parent Offspring)
  • family ID 483 has potential issue
    • inbreeding coefficient = 1 between IID:328 (Female) and IID:1491 (Female)
    • MZ? same individual?

2.2.1 Estimated pairwise IBD distributions



2.2.2 Family 483



2.3 Individual genome-wide heterozygosity

2.3.1 Genome-wide heterozygosity VS missing rates



2.3.2 Genome-wide F VS missing rates



3 Imputation

3.1 Pre-imputation

The imputation pipeline follows that used for SSC dataset. A total of 7769 individuals and ~784K autosomal, ~22K chrX SNPs were used for further impution.

  • filters: -- geno 0.05 --mind 0.2 --maf 0.01 --hwe 1e-6
    • 111 people removed due to missing genotype data (–mind).
    • Total genotyping rate in remaining samples is 0.914029.
    • 124565 variants removed due to missing genotype data (–geno).
    • 15633 variants removed due to Hardy-Weinberg exact test.

Note that a liberal threshold 0.2 was used for individual genotype missing rates (–mind) for AGP data here since, a large number of individuals with imiss > 0.1. 111 people with imiss ranging from 0.7 to 1.



3.2 After Imputation

3.2.1 Frequency distribution



3.2.2 PCA